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The CUBA Library 

T. Hahn 

Max-Planck-Institut fur Physik 

Fohringer Ring 6, D-80805 Munich, Germany 

Concepts and implementation of the Cuba library for multidimensional numerical integration are elucidated. 



1. Overview 

Cuba ^] is a library for multidimensional nu- 
merical integration. It features four integration 
algorithms with interfaces for Fortran, C/C++, 
and Mathcmatica. All four can integrate vector 
integrands and their invocation is quite similar to 
make it easy to switch routines for comparison. 
The main characteristics are summarized below. 

Routine: Vegas 

Basic integration methods available: 

• Sobol sample (quasi Monte Carlo) 

• Mersenne Twister sample (pseudo Monte Carlo) 
Variance reduction: importance sampling 

Routine: Suave 

Basic integration methods available: 

• Sobol sample (quasi Monte Carlo) 

• Mersenne Twister sample (pseudo Monte Carlo) 
Variance reduction: importance sampling com- 
bined with globally adaptive subdivision 

Routine: Divonne 

Basic integration methods available: 

• Korobov sample (lattice method) 

• Sobol sample (quasi Monte Carlo) 

• Mersenne Twister sample (pseudo Monte Carlo) 

• cubature rules (deterministic method) 
Variance reduction: stratified sampling, aided by 
methods from numerical optimization 

Routine: Cuhre 

Basic integration method available: 

• cubature rules (deterministic method) 
Variance reduction: globally adaptive subdivi- 
sion 

Before explaining the buzzwords appearing in 
this list, it is perhaps worthwhile to address two 
frequently asked questions. 



First, numerical integration rapidly becomes 
more difficult with increasing dimension, no mat- 
ter how many tricks are built into the integra- 
tor. To illustrate this, imagine computing the 
volume of the d-diva. sphere Sd by integrating 
its characteristic function x = $(1 — II ^11 2) in- 
side the surrounding hypcrcube d = [—1, l] d . In 
a Monte Carlo way of thinking, then, the ratio 
r = Vol Sd/ Vol Cd can be taken as the chance 
that a general-purpose integrator will find the 
sphere at all. The numbers clearly display what 
is often termed the 'curse of dimensionality': 



(1) 



Second, Cuba (and, for that matter, most mul- 
tidimensional integrators) can do only Ricmann 
integrals of the form 
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I/:= f d d xf(x). 
Jo 



(2) 



Most questions concern the boundaries, although 
it is straightforward to transform a finite integra- 
tion region to the unit hypercube: 

d d xf(x) = [ d d yJf(x), (3) 

ai J a d JO 
d 

J = Y\_( b i ~ a i) i X i = a 'i + ( b i ~ a 0?/» ■ 
i=l 

2. Concepts 

2.1. Deterministic vs. Monte Carlo 

Cuba contains both deterministic and Monte 
Carlo integration methods. The deterministic ap- 
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proach is based on cubature rules, 

n 

I/«C„/ (4) 

i=l 

with specially chosen nodes Xi and weights Wi. 
Error estimation is done e.g. by null rules N m 
(771 < n) which are constructed to give zero for 
functions integrated exactly by C„ and thus mea- 
sure errors due to "higher terms." 

The Monte Carlo estimate, although quite sim- 
ilar in form, 

1 - 

I/wMn/ ( 5 ) 
n z — ' 

i=l 

is conceptually very different as this formula de- 
notes the statistical average over independent and 
identically distributed random samples 2% . In this 
case the standard deviation furnishes a proba- 
bilistic estimate of the integration error: 

a(M„/) = VM n /2-M2/. (6) 

2.2. Construction of Cubature Rules 

Starting from an orthogonal basis of functions 
b m } - usually monomials - with which 
most / can (hopefully) be approximated suffi- 
ciently well one imposes that each bi be integrated 
exactly by C n : I bi = C n bi. These are m moment 
equations 

n .1 
y^w k b l {x k )= \ d d xbi{x) (7) 
fe=i Jo 

for nd + n unknowns at, and i/j^. They pose a 
formidable, in general nonlinear, system of equa- 
tions. Additional assumptions, e.g. symmetries, 
are usually necessary to solve this system. Cuba 
employs the Genz-Malik rules [2J constructed 
from a symmetric monomial basis. 

2.3. Globally Adaptive Subdivision 

Once an error estimate for the integral is avail- 
able, global adaptiveness is easy to implement: 

1. Integrate the entire region: J to t ± E to t- 

2. while E tot > max(e re i/tot, e a bs) 

3. Find the region r with the largest error. 



4. Bisect (or otherwise cut up) r. 

5. Integrate each subregion of r separately. 

6. /tot = E^> E tot = s /YM- 

7. end while 

A remark is in order here about the two pre- 
cisions, e ro i and e a bs- Naively what one imposes 
is the relative precision: the result is supposed to 
be accurate to, say, one part in a thousand, i.e. 
EreJ = 10~ 3 . For integral values approaching zero, 
however, this goal becomes harder and harder to 
reach, and so as not to spend inordinate amounts 
of time in such cases, an absolute precision e a b s 
can be prescribed, where typically e a b s <C £ rc i- 

2.4. Importance Sampling 

Importance sampling introduces a weight func- 
tion into the integral: 

w(x) > , Iw = 1 , 

with two requirements: a) one must be able to 
sample from the distribution w(x), b) f/w should 
be "smooth" in the sense that a w (f/w) < cr(f), 
e.g. w and / should have the same peak structure. 
The ideal choice is known to be w(x) — \f(x)\/If 
which has a w (f/w) = 0, but is of little use as it 
requires a-priori knowledge of the integral value. 

2.5. Stratified Sampling 

Stratified sampling works by sampling subre- 
gions. Consider a total of n points sampled in a 
region r = r a + r/, vs. n/2 points sampled in r a 
and tz/2 in rf,. In the latter case the variance is 

Ifgjf 1 ° 3 b f\_ <r 2 af + ° 2 b f (q] 
4 \n/2 n/2 ) 2n 1 ' 

whereas in the former case it can be written as 

° 2 f = vlf + °l! , (la/ - hf) 2 Q) 
n 2n An 

Even in this simple example the latter variance 
is at best equal to the former one, and only if 
the integral values are identical. The optimal re- 
duction of variance can be shown to occur for 
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n a/ n b — °~af /o~bf EH The recipe is thus to split 
up the integration region into parts with equal 
variance, and then sample all parts with same 
number of points. 

2.6. Quasi-Monte Carlo Methods 

Quasi-Monte Carlo methods are based on the 
Koksma-Hlawka inequality which states an up- 
per bound on the error of an integration formula 

Qn/ = n Si=l f(. X i)i 

|Q„/-I/|<V(/)D*(*i,...,*n). (11) 

Apart from choosing a different integrand there 
is little one can do about V{f), the "variation in 
the sense of Hardy and Krause." The discrepancy 
D* of a sequence S\, . . . ,x n is defined as 



Sobol Quasi-Random Numbers 



D* 



sup 

•6 [0,1]- 



v(r) 



Voir 



(12) 



where i/(r) counts the Xi that fall into r. The 
word "equidistributed" indeed commonly means 
that v{r) is proportional to Voir. Quasi-random 
sequences can be constructed with a substantially 
lower discrepancy than (pseudo-) random num- 
bers. A Monte Carlo algorithm based on these 
sequences typically achieves convergence rates of 
O(log d_1 n/n) rather than the usual 0(l/y/n). 

Cuba offers a choice of quasi-random Sobol se- 
quences 0] or pseudo-random Mersenne Twister 
sequences |5] for all Monte Carlo algorithms. Fig- 
ure^shows that quasi-random numbers cover the 
plane much more homogeneously than pseudo- 
random numbers. 

2.7. Lattice Methods 

Lattice methods require a periodic integrand, 
usually obtained by applying a periodizing trans- 
formation (Cuba's Divonne uses x — > \2x — ID- 
Sampling is done on an integration lattice L 
spanned by a carefully selected integer vector z: 



n — 1 

i=0 

{x} = fractional part of ; 



(13) 



z is chosen (by extensive computer searches) to 
knock out as many low-order "Bragg reflections" 




n = 3000 n = 4000 

Mersenne Twister Pseudo-Random Numbers 



Figure 1. Comparison of sequences. 



as possible in the error term (see e.g. [§]): 
L n f -If=J2 /(*) L «e 2Tifc ^ - 1(0) 

= J2 /(*)» ( 14 ) 

keL ± , fe^O 

where L = {k E 7* d : k ■ z = (mod n)} is the 
reciprocal lattice. 

3. Implementation 

3.1. Vegas 

Vegas is Lepage's classic Monte Carlo algo- 
rithm 7 . It uses importance sampling for vari- 
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ance reduction for which it iteratively builds up 
a piecewise constant weight function, represented 
on a rectangular grid. Each iteration consists of 
a sampling step followed by a refinement of the 
grid. 

In Cuba's implementation Vegas can memo- 
rize its grid for subsequent invocations and it can 
save its internal state intermittently such that the 
calculation can be resumed e.g. after a crash. 

3.2. Suave 

Suave is a cross-breed of Vegas and Miser [H] , 
a Monte Carlo algorithm which combines Vegas- 
style importance sampling with globally adaptive 
subdivision. 

The algorithm works as follows: Until the re- 
quested accuracy is reached, bisect the region 
with the largest error along the axis in which the 
fluctuations of the integrand are reduced most. 
Prorate the number of new samples in each half 
for its fluctuation. 

The Vegas grid is kept across divisions, i.e. a 
region which is the result of n — 1 subdivisions 
has had n Vegas iterations performed on it. On 
the downside, Suave is somewhat memory hungry, 
as it needs to retain samples for later use. 

3.3. Divonne 

Divonne is a significantly extended version of 
CERNlib's Algorithm D151 0. It is essentially 
a Monte Carlo algorithm but has cubature rules 
built in for comparison, too. Variance reduction 
is by stratified sampling, which is aided by meth- 
ods from numerical optimization. Divonne has a 
three-phase algorithm: 

Phase 1: Partitioning 

The integration region is split into subregions 
of (approximately) equal spread s, defined as 

Voir / _ \ 

s(r) = — - — ( max /(a?) — min f(x) J . (15) 

2 V x£r x£r / 

Minimum and maximum of each subregion are 
sought using methods from numerical optimiza- 
tion (essentially a quasi-Newton search). Then, 
'dividers' are moved around (see picture) to find 
the optimal splitting. This latter procedure can 
cleverly be translated into the solution of a linear 
system and is hence quite fast (for details sec 9 ) . 



Phase 2: Sampling 

The subregions determined in Phase 1 are in- 
dependently sampled with the same number of 
points each. The latter is extrapolated from the 
results of Phase 1. 

Phase 3: Refinement 

Regions whose results from Phase 1 and 2 do 
not agree within their errors are subdivided or 
sampled again. This phase is an addition to the 
original algorithm since it was found that often 
enough the error estimate, or even the integral 
value, was off because characteristics of the inte- 
grand had not been found in Phase 1. 

Two important features have been added in the 
Cuba implementation: 

• The user can point out extrema for tricky 
integrands. 

• For integrands which cannot be sampled too 
close to the border, a 'safety distance' can 
be prescribed within which values will be 
extrapolated from two points in the interior. 

3.4. Cuhre 

Cuhre is a deterministic algorithm. It uses the 
Genz-Malik cubature rules !2. in a globally adap- 
tive subdivision scheme. The algorithm is thus: 
Until the requested accuracy is reached, bisect the 
region with the largest error along the axis with 
the largest fourth difference. 

Cuhre has been re-implemented in Cuba 
mostly for a consistent interface, it is the same 
as the original DCUHRE subroutine |10j . 

4. Comparison 

Doing a balanced comparison on integration 
algorithms is nearly hopeless. Performance de- 
pends highly on the integrand and there are al- 
ways cases, and not just academic ones, in which 
one routine outperforms the others, or conversely, 
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in which one routine simply gives wrong results. 
This, of course, is the main reason why there are 
four independent and easily interchangeable algo- 
rithms in the Cuba library. 

In this context it should be pointed out that 
the statistical error estimate quoted by the Monte 
Carlo algorithms merely states the one-sigma in- 
terval, or in other words: the probability that the 
central value lies in the given interval is (only) 
about 68%. 

With these caveats, the following plot compares 
the performance of the four Cuba routines on 
a real phase-space integration. The results ob- 
tained by the four methods (not shown here) in- 
deed agree to within the requested accuracy. 

Integrand evaluations 

10 6 
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5. Mathematica interface 



The Mathematica interface deserves a special 
mention, as it is not a library in the proper sense. 
It is rather four executables which communicate 
with Mathematica via the MathLink API: 

Mathematica C 
MathLink 



Vegas [/, 



.] 



integrand / 

(compiled function) 



{X1,X2,-..} 



{/l,/2,...} 



void Vegas (. . . ) 



request samples 



After loading the appropriate MathLink exe- 
cutable, e.g. with Install ["Vegas"] , the corre- 
sponding routine can be used almost like Math- 
ematica's native NIntegrate. The integrand 
is evaluated completely in Mathematica, which 
means that one can do things like 



Cuhre [Zeta[x y] , {x,2,3>, {y,4,5>] 



6. Further Tools 



6.1. Chooser 

Cuba includes a "one-stop interface" 
further simplifies the invocation: 



which 



subroutine Cuba (method, ndim, ncomp, 
integrand, integral, error, prob) 

The user just has to choose method = 1,2,3,4 
to switch between Vegas, Suave, Divonne, Cuhre. 
All parameters specific to individual routines are 
"hidden" (determined inside the routine), i.e. this 
is not a finished product, but should be adapted 
by the user. 

6.2. Partition Viewer 

Cuba's Partition Viewer displays the partition 
taken by the integration algorithm. This is some- 
times helpful to visualize where the integrand's 
characteristic regions lie. It is really useful only 
in small to moderate dimensions, though. 

Verbosity level 3 must be chosen in the inte- 
gration routine and the output piped through the 
partview utility, as in 

myprogram I partview 1 2 

which will then display the 1-2 plane of the par- 
titioning. Figure |21 shows a screenshot. 

7. Summary 

Cuba is a library for multidimensional numeri- 
cal integration written in C. It contains four inde- 
pendent algorithms: Vegas, Suave, Divonne, and 
Cuhre which have similar invocations and can be 
exchanged easily for comparison. All routines can 
integrate vector integrands and have a Fortran, 
C/C++, and Mathematica interface. Additional 
tools are included, such as a one-stop invoca- 
tion and a partition viewer. Cuba is available 
at http://www.feynarts.de/cuba, licensed un- 
der the LGPL, and easy to build (autoconf). 
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Figure 2. Screenshot of Cuba's partition viewer. 
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